Efficient Diskless Checkpointing and Log Based Recovery Schemes

نویسندگان

  • Subba Rao
  • Sai Krishna
چکیده

Checkpointing and message logging are the popular and generalpurpose tools for providing fault tolerance in distributed systems. Diskless checkpointing schemes enable frequent checkpointing without a performance penalty. The present work extends James S Plank‟s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ mechanism to checkpoint programs with high locality of reference. This mechanism enables applications with high locality of reference to take checkpoints periodically. The limitation of N+1 Parity scheme is that all the processes freeze their respective computation, while taking synchronous checkpoints. The proposed scheme solves this problem by introducing a new message logging technique namely partial message logging which allows asynchronous checkpointing at both sender and receiver. Correctness of the scheme is established through a set of proofs. This paper includes the performance evaluation of proposed scheme by making use of distributed simulator test-bed. The results indicate that proposed scheme outperforms N+1 Parity Scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using two-level stable storage for efficient checkpointing - Software, IEE Proceedings- [see also Software Engineering, IEE Proceedings]

Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, checkpoint data is saved on disk, however, in some situations the time to write the data to disk can represent a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper starts by presenting two main memory ch...

متن کامل

Enhanced Two-level Fault Recovery Scheme Combined with Message Logging

⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...

متن کامل

Adaptive Checkpointing

Checkpointing is a typical approach to tolerate failures in today’s supercomputing clusters and computational grids. Checkpoint data can be saved either in central stable storage, or in processor memory (as in diskless checkpointing), or local disk space (replacing memory with local disk in diskless checkpointing). But where to save the checkpoint data has a great impact on the performance of a...

متن کامل

Diskless Checkpointing Diskless Checkpointing

The precursor to this work (where diskless checkpointing was rst described) was presented at FTCS-24 27]. Abstract Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motiva...

متن کامل

Accommodating Logical Logging under Fuzzy Checkpointing in Main Memory Databases

This paper presents a simple and effective method to reduce the size of log data for recovery in main memory databases. Fuzzy checkpointing is known to be very efficient in main memory databases due to asynchronous backup activities. By this feature, most recovery works in the past have used only physical logging schemes. Since the size of physical log records is quite large, physical logging s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010